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We develop a general method to explore how the function performed by a biological network can 
constrain both its structural and dynamical network properties. This approach is orthogonal to prior 
studies which examine the functional consequences of a given structural feature, for example a scale 
free architecture. A key step is to construct an algorithm that allows us to efficiently sample from a 
maximum entropy distribution on the space of boolean dynamical networks constrained to perform 
a specific function, or cascade of gene expression. Such a distribution can act as a "functional null 
model" to test the significance of any given network feature, and can aid in revealing underlying 
evolutionary selection pressures on various network properties. Although our methods are general, 
we illustrate them in an analysis of the yeast cell cycle cascade. This analysis uncovers strong 
constraints on the architecture of the cell cycle regulatory network as well as significant selection 
pressures on this network to maintain ordered and convergent dynamics, possibly at the expense of 
sacrificing robustness to structural perturbations. 

PACS numbers: 87.10.+e, 87.17.Aa 



I. INTRODUCTION 

A central problem in biology involves understanding 
the relationship between structure and function. This 
problem becomes especially intricate in the era of sys- 
tems biology in which the objects of study are biolog- 
ical networks composed of large numbers of interacting 
molecules. To what extent does the structure of a biolog- 
ical network constrain the range of functions, or types of 
dynamical behaviors, that the network is capable of pro- 
ducing? Conversely, to what extent does the requirement 
of carrying out a specific function constrain the structural 
and more general dynamical properties of a network? 

There already exists a large body of theoretical work 
addressing the former question. For example Kauffman 
[T] and others performed an extensive study of ensem- 
bles of simplified boolean networks with fixed structural 
properties, such as the number of nodes and the mean 
degree of connectivity. A principal finding was a phase 
transition in the resulting dynamical behavior, from or- 
dered to chaotic, as the connectivity increased [2J. More 
recently, the observation that many biological networks 
are scale free spurred a flurry of research into the dy- 
namical consequences of the scale free structural feature 
[H m El El [7]. A principal finding was that the scale- 
free architecture is more robust to random failures and 
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dynamic fluctuations, and may be more evolvable. 

Alternatively, the latter question involving the struc- 
tural and dynamical consequences of performing a spe- 
cific function remains relatively unexplored [H [9j [10] . It 
is an important question because many biological func- 
tions are performed by relatively small network modules 
for which gross statistical properties such as mean con- 
nectivity, or any kind of degree distribution, scale free 
or not, do not have a clear significance. A key example 
of such a module is the yeast cell cycle control network, 
whose essential function was reduced to the boolean dy- 
namics of a set of 11 nodes by Li, et. al. [11]. Despite 
the small network size, a dynamical analysis of this sim- 
ple model demonstrated a great deal of robustness of the 
cell cycle trajectory to both fluctuations in protein states 
and perturbations of network structure. Is this robust- 
ness carefully selected for through evolution and encoded 
somehow in the topological structure of the cell cycle net- 
work, or does it arise for free, simply as a consequence of 
the functional constraint of having to produce the long 
cascade of gene expression that controls the cell cycle? 

In this paper we develop techniques to address this 
question, and more generally to address the consequences 
of specific functional, rather than structural constraints. 
A key step in the above work exploring the dynamical 
consequences of fixed structural constraints was the abil- 
ity to efficiently sample from the maximum entropy dis- 
tribution on the space of networks constrained to have 
a fixed structural feature, such as a given degree distri- 
bution. We call such an ensemble a structural ensemble. 
Similarly, in order to address the structural consequences 
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of fixed functional constraints, we develop an efficient al- 
gorithm to sample from a maximal entropy distribution 
on the space of biological networks constrained to per- 
form a specific function. We call the resulting ensemble 
of networks a functional ensemble. 

Since we use a maximal entropy distribution, which in- 
troduces no further assumptions about the network other 
than the fact that it performs a given function, we can 
use such functional ensembles to test whether any dy- 
namical or structural property of a given biological net- 
work is simply a consequence of the function it performs, 
or rather a consequence of further selection. Under this 
functional null model, statistically significant properties 
of a real biological network can give us insight into ad- 
ditional selective pressures that have operated on that 
network above and beyond the baseline functional re- 
quirements. As an illustration of this general method, 
we focus on the specific case of the cell cycle network 
of uncovering deeper insight into the evolutionary 
pressures on its structural and dynamical properties. 

II. METHODS 

A. The Dynamical Model 

We consider a simplified, boolean model of biological 
network dynamics, in which each degree of freedom, or 
node Si, i = 1 ... TV takes one of two values at any given 
time: either (inactive) or 1 (active). For example, the 
two values could signify whether a protein is expressed 
or not, or whether a kinase is activated or not. Thus the 
full state at time t is captured by a column vector 

S{t)^is,{t),S2{t),...,SN{t)f (1) 

that can take one of 2^ values. Time progresses in dis- 
crete steps, and the nodes can either activate or inhibit 
each other at the next step. These interactions are cap- 
tured by the network connectivity matrix C with ele- 
ments Cij representing an interaction arrow from node j 
to node i. The allowed values of Cy are given by 

[-1,0,1]. (2) 

For two (possibly identical) nodes i and j, if Cij is 
nonzero, it can be either activating (-1-1) or inhibiting 
(-1). This terminology is justfied by the dynamical rule 

s,it + l)^MC,Sit)), (3) 

where 

(h EjC.js,(t)>o 

/.(C,S(i))= <^ 0, E,c.,s,(t)<0 (4) 

Essentially, if the total input to a node is positive (neg- 
ative), it will be on (off) at the next time step. In the 
case of zero input, the node maintains its state. 



B. Generating Functional Ensembles 

A biological network achieves its function by success- 
fully taking the values of its nodes through a sequence of 
states. Thus we will equate the notion of function with 
a specified state trajectory 

s(o)-.s(i)^...^s(r). (5) 

In our example of the cell cycle network, the state se- 
quence is simply the natural cell cycle trajectory. We 
wish to either enumerate or uniformly sample from the 
space of networks, or equivalently connectivity matrices 
C, that can successfully perform the above sequence of 
T state transitions. We can think of each of the T tran- 
sitions as providing one constraint on the connectivity 
matrix C via the dynamical rule given by equations ([s]) 
and ([4]). Assuming the nodes are distinguishable, the 
number of networks, given by the number of allowed 
connectivity matrices is M = 3^ . Even for small, meso- 
scopic scale networks such as the cell cycle with 11 nodes, 
M w 5.39 X 10^'' and so it is computationally infeasible 
to iterate through all of these networks and find those 
for which the T constraints corresponding to the T tran- 
sitions in (|5| are satisfied. Even if one were to sample 
from these Al networks, one would rarely find a network 
that could perform the function in ([s]) . 

However, it is important to note that the constraint 
on the network connectivity C imposed by a given tran- 
sition actually decouples across the rows of the connec- 
tivity matrix. That is for each node z, the dynamical rule 
in Eq. Q depends only on the «'th row Cij,j = 1 . . . 
of C, or equivalently on the N incoming interaction ar- 
rows whose target is node i. Thus we can check that 
the T constraints induced by the target sequence ([5| are 
satisfied for each row, independently of the other rows. 
For any given i, the number of possible rows is Z = 3^, 
which is 177,147 for the yeast cell cycle network. Thus it 
becomes computationally feasible to exhaustively iterate 
through all possible row values, or incoming arrow combi- 
nations, for each node, and check that the T constraints 
are satisfied for each such combination. 

After following this procedure using the cell-cycle pro- 
cess as the constraint, let I < cm < Z index the set 
of allowed incoming arrow combinations to node i that 
satisify all constraints. If for each node i, we uniformly 
choose a particular aj, and assemble the corresponding 
N allowed rows into a matrix C, we will have constructed 
a network that can successfully carry out the state tra- 
jectory in ([5|. This is essentially our sampling procedure. 
It produces a functional ensemble: a maximum entropy 
distribution on the space of networks constrained to pro- 
duce the function represented by (|5|. 

C. Combined structural and functional ensembles. 

In order to perform a more fine scale study of the prop- 
erties of the yeast cell cycle network, we wish to con- 
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strain more than just a predefined function. We would 
also like to understand how various properties depend on 
the number of nonzero interaction arrows in the network. 
Thus we need to develop a method to uniformly sample 
from the space of networks that both perform a fixed 
function and have a fixed number of arrows. However, 
although the algorithm presented in the previous section 
uniformly samples the space of networks performing a 
fixed function, when one adds a constraint on the num- 
ber of arrows, this algorithm performs a biased sampling 
of the more constrained space of networks. 



To see the origin of this bias, consider an implemen- 
tation of the algorithm. Let A be the desired number of 
arrows in the network. Fix an arbitrary ordering of the 
nodes i = 1 . . . N. Choose node 1 and pick at random 
one incoming arrow combination from a uniform distri- 
bution on the allowed incoming arrow combinations to 
node 1. Let ai be the number of nonzero interaction ar- 
rows in the chosen combination. We then have A ~ ai 
incoming arrows left to distribute to nodes 2 . . . N . If 
the algorithm continues in this way all the way to node 
N, one can see that the resulting sampling of networks 
is biased in such a way that earlier nodes in the order 
have a higher likelihood of receiving more incoming ar- 
rows relative to later nodes. For example, in the extreme 
case, the algorithm may run out of arrows even before 
it reaches the final node N. To correct for this bias, at 
each step of the algorithm corresponding to each node i 
we must sample nonuniformly from a certain conditional 
distribution on the allowed incoming arrow combinations 
to node i. 



We perform this nonuniform sampling as follows. Sup- 
pose that by the time we have reached node i, we have 
incoming arrows left to distribute among the remaining 
nodes from i . . . N. By definition, ri ~ A, the number of 
desired arrows in our ensemble. To choose a particular 
incoming arrow combination to node i, we first randomly 
draw the number of incoming arrows ai we wish to as- 
sign to node i from a conditional probability distribution 
PiiO'ilfi) ■ Then we draw the particular combination from 
a uniform distribution on the set of allowed incoming ar- 
row combinations that have exactly nonzero arrows. 
P(ai\ri) is a conditional distribution on the number of 
nonzero incoming arrows to node i, conditioned on the 
total number of incoming arrows to the remaining unas- 
signed nodes i . . . N, and it can be computed as follows. 
Let Wi{ai) be the number of allowed incoming arrow 
combinations to node i that have exactly Ui nonzero ar- 
rows. Let (5i_|_i(6) be the number of ways to distribute b 
incoming arrows (consistent with the fixed function the 
network must perform) to the remaining nodes i+l . . . N. 
Then for each choice ai of the number of incoming arrows 
to node i, the quantity Wi{ai)Qijf.i(ri — ai) is the num- 
ber of allowed ways to complete the network. P(ai\ri) is 



simply proportional to this number: 



Piiai\ri) 



Wi{a^)Qt+i{ri - ai) 



E 



h=0 



W^{b)Q,+i{n-b)' 



i = I...N-1. 



(6) 



Intuitively, the combinatoric factor Qi+i{ri — ai) in the 
definition of of the conditional distribution Pi{ai\ri) cor- 
rects for the biased sampling mentioned above by forcing 
the algorithm to choose uniformly from the set of possible 
completions of the network, rather than simply uniformly 
from the set of allowed incoming arrows to node i. For 
the special case of the last node i = N, when we have 
rjv arrows left to distribute, we simply choose uniformly 
from the allowed space of incoming arrow combinations 
to node N that have exactly rjv nonzero arrows. Fur- 
thermore, each time we run the algorithm to obtain a 
network, we randomize the ordering on the nodes. 

For any fixed ordering of the nodes, the combinatoric 
factors Qi{b) are crucial to the success of the sampling 
algorithm. We note that these factors can be computed 
efficiently by working recursively from node i — N back 
down to I = 1 . More precisely, if we define 

Qjv(&) = WN{b), (7) 
then for each i — 1 . . . N— 1 we have the recursion relation 



Qi{b) = 



min{N ,b) 

E 

Q = 



W,ia)Q,+iib-a). 



(8) 



Thus this algorithm gives us an efficient way to sample 
from a combined structural and functional ensemble: a 
maximum entropy distribution on the space of networks 
constrained to have a fixed function and a fixed number 
of arrows. It is this algorithm that we use to generate 
the ensemble of "cell cycle" networks described below. In 
order to compare this ensemble to a set of more random 
networks that serve no particular function, we generated 
this "random network" ensemble by randomly rewiring 
the connections in each "cell-cycle" network under the 
constraint that all nodes must be connected to the same 
network, i.e. no isolated nodes or sub-networks. 



III. THE YEAST CELL-CYCLE NETWORK 

The simplified yeast cell-cycle Boolean network [see 
Fig. [l] given in Li et al. [TT] contains 11 proteins, or 
nodes, and 1 checkpoint. There are 34 arrows connect- 
ing the nodes: 15 activating and 19 deactivating ("self- 
degrading" arrows are equivalent to deactivating arrows 
under our dynamical model) . Using the above dynamical 
model, this network can produce the cell-cycle process, as 
shown in Table |T] Starting from the "excited" Gi state, 
the process goes through the S phase, the G2 phase, the 
M phase, and finally returns to the biological Gi station- 
ary state. The network also has 7 fix-points, with the Gi 
stationary state being the biggest, having a basin size of 
1764 (w 86% of all protein states). 
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FIG. 1; Simplified yeast cell-cycle network from Li et al. [11] . 
Each node represents a gene or its protein product that par- 
ticipates in the regulation of the yeast cell-cycle. There are 
cyclins (Clnl,2, Cln3, Clbl,2, Clb5,6), transcription factors 
(SBF, MBF, Mcml/SFF, Swi5), and factors that inhibit or 
degrade cyclins (Sicl, Cdhl, Cdc20). Solid and dashed arrows 
are activating and deactivating interactions, respectively. The 
black solid and black dashed arrows are absolutely required 
for a network to produce the cell-cycle process [see Table IT]. 



IV. RESULTS 

We used the same 11 nodes and the types of connec- 
tions in the simplified yeast cell-cycle Boolean network 
to construct our ensembles of networks. Using our tech- 
nique to generate purely functional ensembles, we were 
able to select 11 sets of inward connection combinations 



(one for each node) that produce the cell-cycle process 
[see Table Figure [2] shows the compositions of different 
types of connections in the sets. The number of selected 
connection combinations for each node (shown in paren- 
thesis in Fig. [2]) varies for two orders of magnitude. The 
number of networks that can realize the cell-cycle process 
is 5.11 X lO'^* and the distribution against the number of 
arrows is shown in Fig. [3] 



A. Constraints on Structure from Function. 

From Fig.|2]we can deduce that there are 10 core con- 
nections (shown as black solid and black dashed arrows 
in Fig. [1]) that are absolutely required in order to pro- 
duce the cell-cycle process. These required connections 
become obvious once we look closer into the cell-cycle 
process. For example, comparing the Gi stationary state 
and the "excited" Gi state, Cln3 is the only node that 
is turned on; this implies that MBF and SBF, which are 
turned on in the next time step, can only be activated 
by Cln3. The remaining required connections can all be 
deduced using the same logic. The compositions of differ- 
ent connection types follow a common trend where there 
is a higher chance for a node to be positively regulated 
by nodes that are active earlier in the cell-cycle process 
(positive feed-forward) and negatively regulated by nodes 
that are active later in the process (negative feedback) 
[see Fig. [4]. This trend seems to be general for networks 
that produce cascades of activation. To check this, we 
looked at the connection compositions for two simple ac- 
tivation cascades, where 11 nodes are activated in turn 
for 4 or 5 time steps, and they indeed show the same 
trend [See Table and Fig. [5] for 5 time steps activation 
cascade] . 

Next, we generated two network ensembles for each 
number of arrows allowed in the space of cell cycle net- 
works. This number of arrows varied from 24 to 117 



5 



Cln3 (16025) 



MBF (683) 



SBF (683) 



Cdh1 



Cln3 



Hi 





%W<<«A%V*A. %W<<«A%'\.%\ 

Cln1,2 (3031) Clb5,6(2106) Clb1,2 (806) 




Mcml (504) 



Cdc20 (2161) 




<iP%*'^«^ «^., %■> "'^^^ * ^-^^-^^Vw*® 

Sid (1584) Cdhi (453) 




■•? ^/^o ^ 

Swi5 (1701) 



n activating 
□ no connection 
■ deactivating 



FIG. 2: Fractions of difTerent types of inward connections to 
each node from all other nodes (including itself) that produce 
the cell-cycle process [see Table |l] . Numbers in parenthesis 
are the number of connection combinations selected by our 
algorithm [see Methods]. 
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FIG. 3: Distribution of number of networks that can real- 
ize the cell-cycle process over the number of arrows in the 
networks. 



as seen in Fig. [3] The first ensemble is a combined 
structural/functional ensemble [see Methods] consisting 
of 1,000 networks that both realize the cell cycle and have 
a fixed number of arrows. These networks will be re- 
ferred to as the "cell-cycle networks" (CN). The second 
ensemble was generated by randomly reconnecting the 
arrows for each network in the first ensemble [see Meth- 
ods]. This ensemble will be referred to as the "random 
networks" (RN). 
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FIG. 4: Positive feed-forward and negative feedback connec- 
tions between network nodes. Solid and dashed arrows are 
the most probable activating and deactivating interactions, 
respectively, selected from Fig. [2] The activating arrow for 
Cln3 is not a clear winner, therefore colored gray. Nodes are 
placed in the order of their activation in the cell-cycle process, 
starting from Cln3 and continues clockwise, SBF and MBF 
turn on at the same time, and similarly for Mcml/SFF and 
Clbl,2. Positive feed-forward interactions connecting Cln3 
to Clnl,2, Cln3 to Clb5,6 and Clb5,6 to Mcml/SFF can be 
seen. Negative feedback connections from later nodes to ear- 
lier nodes are also obvious. These interactions will be more 
common when more arrows are added. This network (less the 
gray arrow) does not produce the cell-cycle process but one 
very close to it. 
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the activation cascade in Table Ull 
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TABLE II: A simple activation cascade with 11 nodes acti- 
vated in turn for 5 time steps 
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FIG. 6: Number of attractors and basin sizes of attractors for 
the two ensembles of networks, (a) Distribution of number of 
attractors. Inset shows the distribution for < 10 attractors 
(b) Number of attractors averaged over networks with the 
same number of arrows, (c) Distribution of size of basins of 
attractions, (d) Basin size averaged over networks with the 
same number of arrows. 



B. Analysis of Attractors: Large Basins for Free 

We studied the time evolution of protein states of the 
two ensembles by using the dynamical model described 
in the Methods and initiating the networks from each 
of the 2" = 2, 048 states. We found that the CN net- 
works have fewer attractors and larger attractor basin 
sizes compared to the RN networks [see Fig. [6] (a),(c)]. 
The number of attractors decreases as the number of ar- 
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FIG. 7: (a) Distribution of convergence value w„ for network 
states in each ensemble, (b) Overall convergence W averaged 
over networks with the same number of arrows. Error bar 
represent one standard error. 



rows increases, reaching a minimum at around 100 arrows 
and then increases again [see Fig. [6](b)]. The two curves 
cross each other at about 60 arrows, therefore, compared 
to RN networks, CN networks have fewer attractors when 
the number of arrows is small and more attractors when 
the number of arrows is large. The behavior is reversed 
for the size of attractor basins [see Fig.|6](d)]. In the CN 
ensemble, the probabilities for a network with 34 arrows 
to have < 7 attractors and to have the biggest attrac- 
tor basin size > 1764 (as in the case of yeast cell-cycle 
network) are 4.1% and 7.5% respectively. In the RN en- 
semble the corressponding percentages are only 1.5% and 
0.7% respectively. Thus we see that the constraint of hav- 
ing to perform the yeast cell cycle cascade alone can, to a 
certain extent, help explain the origins of these two mea- 
sures of dynamical robustness; a large basin essentially 
arises for free as a consequence of the cell cycle function. 
Indeed, the average basin size of the biggest attractors 
for the CN ensemble is 1397 compared to 1045.53 for the 
RN ensemble. In addition, 95.3% of the networks in the 
CN ensemble have the Gi stationary state as the biggest 
attractor and the average basin size of these attractors is 
1536.92. 



C. Convergence of Trajectories. 

Following |llj . we define a measure Wn that quanti- 
fies the "degree of convergence" of the dynamical net- 
work trajectories onto each network state n where n = 
1, . . . ,2048. Let Tj,k denote the number of trajectories 
starting from all 2048 initial network states that travel 
from state j to state k in one time step. Let L„ denote the 
number of steps it takes to get from state n to its attrac- 
tor, so that we can index the states along the outward tra- 
jectory by /c = 1, . . . ,L„. Then Wn = Y.k=i Tk-i.k/Ln- 
The overall convergence, or overlap W of trajectories is 
given by the average of w„ over all states n. 

The distribution of the w„ values is shown in Fig. [7] 
(a). The result shows that there are more states in the 
CN ensemble having larger Wn values indicating a higher 
degree of convergence in the network dynamics. The lo- 
cal maxima at Wn = 559 for this ensemble should be a 
result of the requirement that networks in this ensemble 
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FIG. 9: Network sensitivity averaged over networks with the 
same number of arrows. Asterisk (*) shows the network sen- 
sitivity of the yeast cell-cycle network. Error bar represent 
four times standard error. 



must produce the cell-cycle process. The overall over- 
lap W [see Fig. [T] (b)], which is the average of w„ over 
all network states, for the yeast cell-cycle network is 743 
and the probability for a network with 34 arrows in the 
CN ensemble to have W > 743 is 3.1%. Such a result 
is highly unlikely in the RN ensemble. In fact no net- 
works in the RN ensemble with 34 arrows had an over- 
lap W > 7 A3. Thus a higher degree of convergence is a 
dynamical consequence of performing the cell cycle func- 
tion, but nevertheless, the actual cell cycle network in 
Fig.jTjstill displays a relatively high degree of convergence 
even within the CN ensemble. The statistical significance 
of the yeast cell cycle's overlap parameter W within the 
functional cell cycle ensemble suggests the existence of a 
strong selection pressure for convergent dynamics. 



D. Dynamical Order: Transients and Sensitivity. 

To compare the degree of order between the networks 
in the two ensembles, we calculated the transient time 
for all network states for all networks [see Fig. [8]. The 
transient time is defined as the amount of time for a net- 
work state to evolve to its attractor, which is equal to the 
length of its trajectory [T^]. The result shows that CN 
networks have longer transient times and thus are more 
chaotic than RN networks (unless the RN networks have 
long attracting limit cycles). The average transient time 
for the yeast cell-cycle network is 7.47, and the probabil- 
ity for a cell-cycle network with 34 arrows to have > 7.47 
average transient time is 4.8%. 

We then calculated the network sensitivity s [13] for all 
networks [see Fig. [9]. Network sensitivity is the average 
expected number of node state changes in the output 
given a one node state change in the input. In other 
words, s calculates the average hamming distance of the 
output states of the network for all hamming neighbor 
input states {i.e. hamming distance =1). If s < 1, 
the network is ordered; fluctuations in node states die 
out quickly and remain localized. If s > 1, the net- 
work is "chaotic" ; fluctuations are amplified at least on 
short time scales. On longer time scales, which are not 
captured, especially for small networks, by this one step 
measure of sensitivity, the network may not be chaotic 
and could still converge to a stable fixed point. When 
s = 1, the network is in some sense critical. 

The result in Fig. |9] indicates that networks in both 
ensembles are on average chaotic, or more precisely, dis- 
play a significant degree of short time scale sensitivity to 
input perturbations, for any number of arrows within the 
range we studied. The sensitivity s increases monotoni- 
cally with the number of arrows in a network. The values 
of s for the RN ensemble remain smaller than those for 
the CN ensemble demonstrating that CN networks are 
indeed more chaotic, or sensitive to input perturbations, 
than RN networks. The yeast cell-cycle network has a 
network sensitivity of 1.27, which is more ordered on av- 
erage than other members in its ensemble. Indeed the 
probability for a network with 34 arrows in the CN en- 
semble to have s < 1.27 is only 3.1%. Thus the actual 
cell cycle is remarkably ordered, or insensitive to input 
perturbations, despite the fact that the functional con- 
straint of performing the cell cycle drives networks to be 
more sensitive to such perturbations on short time scales. 



E. Dynamical response to structural perturbations. 

To check and compare how the networks respond to 
structural perturbations, we performed the same kinds 
of alterations described in Li et al. [TT] on all networks 
in the two ensembles. The alterations included deleting 
arrows from, adding arrows to and switching the signs of 
arrows in the networks. We assessed the response by cal- 
culating the percentage of perturbed networks that retain 
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FIG. 10: Response to perturbations to network structure, (a) Fractions of perturbed networks, for each number of arrows, 
that retain the original biggest attractor. Asterisk (*) shows the fraction for the perturbed yeast cell-cycle network. Error bars 
represent one standard error, (b) Distribution of relative changes of the size of the basin of attraction for the biggest attractor. 



their original biggest attractor [see Fig. 10 (a)] and also 



the relative change in the size B of the basin of attraction 
for the original biggest attractors [see Fig. 10 (b)]. The 
percentages of perturbed networks in the two ensembles 
that retain their original biggest attractor both increase 
initially when the number of arrows in the network is 
small. The percentages are similar when there are about 
24 to 33 arrows in the networks but as the number of 
arrows exceeds 33, the percentage for the RN ensemble 
becomes significantly greater compared to that for the 
CN ensemble. The two percentage become similar again 
when the number of arrows increases beyond 83 and sep- 
arate once more at 144 arrows but the percentage for CN 
ensemble is greater this time. The percentage for yeast 
cell-cycle network (65.7%) is smaller than the average for 
CN networks with the same number of arrows. The prob- 
ability to obtain an equal or higher percentage is 80.7%, 
indicating that the yeast cell-cycle network has a worse 
than average robustness with respect to such structural 
perturbations. 



We noticed from the distributions of AB/B that there 
is a higher chance for perturbations to have a deleterious 
effect to networks in the CN ensemble where the change 
in the sizes of basins of attraction is usually negative. 
However, there is a much higher chance for networks in 
the RN ensemble to completely lose the original biggest 
attractor (AB/B = — 1), which is even more unfavorable. 
The above effects should be attributed to the smaller 
basins of attraction for networks in the RN ensemble. 
The average AB/B for yeast cell-cycle network is -0.342 
and the probability for a CN network with 34 arrows to 
have average AB/B value > —0.342 is 89%. This again 
signifies a worse than average robustness for the yeast 
cell-cycle network. 
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FIG. 11: Fractions of trajectories of the perturbed networks 
starting from the "excited" Gi state that still evolve to the 
biological Gi stationary state and fraction of cell-cycle pro- 
cesses of the perturbed networks remain unchanged. Asterisk 
(*) and cross (x) show the fraction of trajectories reaching 
Gi stationary state and fraction of cell-cycle processes un- 
changed, respectively, for perturbed yeast cell-cycle networks. 
Error bars reprensent four times standard error. 



F. Stability of the cell cycle process. 

Finally, we checked how many perturbed networks 
from the CN ensemble could still maintain the cell-cycle 
process [see Fig. 11 . Starting from the "excited" Gi 
state, the fraction of trajectories reaching the Gi sta- 



tionary state and the fraction of cell-cycle processes un- 
changed increase as the number of arrows in the network 
increases. The fraction of trajectories reaching the Gi 
stationary state and the fraction of cell-cycle processes 
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unchanged for the yeast cell-cycle network are 0.52 and 
0.22 respectively. The probability for a CN network with 
34 arrows to maintain > 52% of trajectories reaching the 
Gi stationary state and > 22% of cell-cycle processes 
unchanged are 79% and 61.2% respectively. 

V. DISCUSSION 

We presented a maximum entropy analysis method 
that can reveal the underlying structural constraints, as 
well as the statistical significance of various dynamical 
properties, of networks that perform a certain function. 
We applied this method to the yeast cell-cycle network 
and the accompanying cell-cycle process [IT]. 

We demonstrated that requiring a network to produce 
an activation cascade, e.g. the cell-cycle process, requires 
the network to have positive feed-forward and negative 
feedback interactions between their nodes. It is not just 
the case that this is a good design principle to realize 
a long transient cascade; it is essentially the only way 
to achieve it generically, and yields an example of how 
network function constrains network structure. 

We also showed that certain dynamical features arise 
purely as a consequence of performing the cell-cycle func- 
tion. Compared to the random (RN) ensemble, networks 
in the cell-cycle (CN) ensemble had much larger basins 
of attraction, a higher degree of convergence of trajecto- 
ries, longer transient times, and more chaotic behavior 
on short time scales, as measured by the network sensi- 
tivity s. These properties may be essential for networks 
to produce a long sequence of state transitions. The long 
trajectory of the cell-cycle process provides many possi- 
ble merge points for other trajectories, which certainly 
contribute to the high degree of convergence in the net- 
work dynamics and the large basin of attraction for Gi 
stationary state. Thus the existence of this globally at- 
tracting trajectory of the dynamics alone can explain to a 
certain extent the observed robustness against dynamical 
perturbations over long time scales. 

On the other hand, with respect to structural pertur- 
bations, the actual yeast cell cycle is relatively less robust 
compared to other networks in the CN ensemble. This is 
in stark contrast to the high degree of dynamical order 
on short time scales displayed by the cell cycle network, 
which suggests that there may be a trade off between 
ordered dynamics and structural robustness. The net- 
work sensitivity [13j . which measures the degree of or- 
der, calculates how sensitive a network is to fluctuations 
in the states of the nodes, which is a major source of 
variation in a cell population [TH [T21 UHl HZ] • Evolution 



may have favored a design for the yeast cell-cycle net- 
work that is ordered and less sensitive to fluctuations in 
the states of the nodes {e.g. it has been reported that 
there is on average only 1 copy of SWI5 mRNA per cell 
in yeast |18p. by sacrificing robustness against pertur- 
bations to the network structure. However, we expect 
that the complete yeast cell-cycle network is more ro- 
bust against such perturbations since it has "redundant" 
components and connections that perform similar jobs. 

In any case, the observation that only 3.1% of ran- 
domly chosen cell cycle networks with 34 arrows are more 
ordered (as measured by the sensitivity s) than the real 
cell cycle network reveals an unsuspected but significant 
selection pressure for short time scale ordered dynamics 
that cannot be explained by the functional requirement 
of maintaining the cell cycle process; indeed the func- 
tional requirement of maintaining the cell cycle proccess 
forces the dynamics in the opposite direction, i.e. to be 
more chaotic on short time scales. Similarly, we have 
seen that simply requiring a long cell cycle to occur au- 
tomatically raises the average degree of convergence of 
trajectories over longer time scales. However, even after 
accounting for this increase within the functional ensem- 
ble, only 3.1% of all cell cycle networks with 34 arrows 
have a greater degree of convergence (as measured by the 
overlap parameter W), reflecting an evolutionary pres- 
sure for convergent dynamics on long time scales above 
and beyond that necessitated by the requirements of the 
cell cycle function alone. 

Although we have focused on a single biological exam- 
ple, the cell cycle, our analysis method is quite general. 
It would be interesting to perform it on other mesoscopic 
scale networks that have a comparable number of com- 
ponents to uncover their structural and dynamical con- 
straints. More generally, we believe these techniques of 
simultaneously analyzing both structural and functional 
ensembles of networks will prove useful in the larger quest 
to deduce general principles governing relations between 
structure, dynamics, function, robustness and evolution. 
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